58 research outputs found
It's Time to Consider "Time" when Evaluating Recommender-System Algorithms [Proposal]
In this position paper, we question the current practice of calculating
evaluation metrics for recommender systems as single numbers (e.g. precision
p=.28 or mean absolute error MAE = 1.21). We argue that single numbers express
only average effectiveness over a usually rather long period (e.g. a year or
even longer), which provides only a vague and static view of the data. We
propose that recommender-system researchers should instead calculate metrics
for time-series such as weeks or months, and plot the results in e.g. a line
chart. This way, results show how algorithms' effectiveness develops over time,
and hence the results allow drawing more meaningful conclusions about how an
algorithm will perform in the future. In this paper, we explain our reasoning,
provide an example to illustrate our reasoning and present suggestions for what
the community should do next
An Empirical Comparison of Syllabuses for Curriculum Learning
Syllabuses for curriculum learning have been developed on an ad-hoc, per task
basis and little is known about the relative performance of different
syllabuses. We identify a number of syllabuses used in the literature. We
compare the identified syllabuses based on their effect on the speed of
learning and generalization ability of a LSTM network on three sequential
learning tasks. We find that the choice of syllabus has limited effect on the
generalization ability of a trained network. In terms of speed of learning our
results demonstrate that the best syllabus is task dependent but that a
recently proposed automated curriculum learning approach - Predictive Gain,
performs very competitively against all identified hand-crafted syllabuses. The
best performing hand-crafted syllabus which we term Look Back and Forward
combines a syllabus which steps through tasks in the order of their difficulty
with a uniform distribution over all tasks. Our experimental results provide an
empirical basis for the choice of syllabus on a new problem that could benefit
from curriculum learning. Additionally, insights derived from our results shed
light on how to successfully design new syllabuses
Document Embeddings vs. Keyphrases vs. Terms: An Online Evaluation in Digital Library Recommender Systems
Many recommendation algorithms are available to digital library recommender
system operators. The effectiveness of algorithms is largely unreported by way
of online evaluation. We compare a standard term-based recommendation approach
to two promising approaches for related-article recommendation in digital
libraries: document embeddings, and keyphrases. We evaluate the consistency of
their performance across multiple scenarios. Through our
recommender-as-a-service Mr. DLib, we delivered 33.5M recommendations to users
of Sowiport and Jabref over the course of 19 months, from March 2017 to October
2018. The effectiveness of the algorithms differs significantly between
Sowiport and Jabref (Wilcoxon rank-sum test; p < 0.05). There is a ~400%
difference in effectiveness between the best and worst algorithm in both
scenarios separately. The best performing algorithm in Sowiport (terms) is the
worst performing in Jabref. The best performing algorithm in Jabref
(keyphrases) is 70% worse in Sowiport, than Sowiport`s best algorithm
(click-through rate; 0.1% terms, 0.03% keyphrases)
Apache Lucene as Content-Based-Filtering Recommender System: 3 Lessons Learned
For the past few years, we used Apache Lucene as recommendation frame-work in
our scholarly-literature recommender system of the reference-management
software Docear. In this paper, we share three lessons learned from our work
with Lucene. First, recommendations with relevance scores below 0.025 tend to
have significantly lower click-through rates than recommendations with
relevance scores above 0.025. Second, by picking ten recommendations randomly
from Lucene's top50 search results, click-through rate decreased by 15%,
compared to recommending the top10 results. Third, the number of returned
search results tend to predict how high click-through rates will be: when
Lucene returns less than 1,000 search results, click-through rates tend to be
around half as high as if 1,000+ results are returned.Comment: Accepted for publication at the 5th International Workshop on
Bibliometric-enhanced Information Retrieval (BIR2017
Real-World Recommender Systems for Academia: The Pain and Gain in Building, Operating, and Researching them [Long Version]
Research on recommender systems is a challenging task, as is building and
operating such systems. Major challenges include non-reproducible research
results, dealing with noisy data, and answering many questions such as how many
recommendations to display, how often, and, of course, how to generate
recommendations most effectively. In the past six years, we built three
research-article recommender systems for digital libraries and reference
managers, and conducted research on these systems. In this paper, we share some
experiences we made during that time. Among others, we discuss the required
skills to build recommender systems, and why the literature provides little
help in identifying promising recommendation approaches. We explain the
challenge in creating a randomization engine to run A/B tests, and how low data
quality impacts the calculation of bibliometrics. We further discuss why
several of our experiments delivered disappointing results, and provide
statistics on how many researchers showed interest in our recommendation
dataset.Comment: This article is a long version of the article published in the
Proceedings of the 5th International Workshop on Bibliometric-enhanced
Information Retrieval (BIR
Implementing Neural Turing Machines
Neural Turing Machines (NTMs) are an instance of Memory Augmented Neural
Networks, a new class of recurrent neural networks which decouple computation
from memory by introducing an external memory unit. NTMs have demonstrated
superior performance over Long Short-Term Memory Cells in several sequence
learning tasks. A number of open source implementations of NTMs exist but are
unstable during training and/or fail to replicate the reported performance of
NTMs. This paper presents the details of our successful implementation of a
NTM. Our implementation learns to solve three sequential learning tasks from
the original NTM paper. We find that the choice of memory contents
initialization scheme is crucial in successfully implementing a NTM. Networks
with memory contents initialized to small constant values converge on average 2
times faster than the next best memory contents initialization scheme
Towards Effective Research-Paper Recommender Systems and User Modeling based on Mind Maps
While user-modeling and recommender systems successfully utilize items like
emails, news, and movies, they widely neglect mind-maps as a source for user
modeling. We consider this a serious shortcoming since we assume user modeling
based on mind maps to be equally effective as user modeling based on other
items. Hence, millions of mind-mapping users could benefit from user-modeling
applications such as recommender systems. The objective of this doctoral thesis
is to develop an effective user-modeling approach based on mind maps. To
achieve this objective, we integrate a recommender system in our mind-mapping
and reference-management software Docear. The recommender system builds user
models based on the mind maps, and recommends research papers based on the user
models. As part of our research, we identify several variables relating to
mind-map-based user modeling, and evaluate the variables' impact on
user-modeling effectiveness with an offline evaluation, a user study, and an
online evaluation based on 430,893 recommendations displayed to 4,700 users. We
find, among others, that the number of analyzed nodes, modification time,
visibility of nodes, relations between nodes, and number of children and
siblings of a node affect the effectiveness of user modeling. When all
variables are combined in a favorable way, this novel approach achieves
click-through rates of 7.20%, which is nearly twice as effective as the best
baseline. In addition, we show that user modeling based on mind maps performs
about as well as user modeling based on other items, namely the research
articles users downloaded or cited. Our findings let us to conclude that user
modeling based on mind maps is a promising research field, and that developers
of mind-mapping applications should integrate recommender systems into their
applications. Such systems could create additional value for millions of
mind-mapping users.Comment: PhD Thesis, Otto-von-Guericke University Magdeburg, German
One-at-a-time: A Meta-Learning Recommender-System for Recommendation-Algorithm Selection on Micro Level
The effectiveness of recommendation algorithms is typically assessed with
evaluation metrics such as root mean square error, F1, or click through rates,
calculated over entire datasets. The best algorithm is typically chosen based
on these overall metrics. However, there is no single-best algorithm for all
users, items, and contexts. Choosing a single algorithm based on overall
evaluation results is not optimal. In this paper, we propose a
meta-learning-based approach to recommendation, which aims to select the best
algorithm for each user-item pair. We evaluate our approach using the MovieLens
100K and 1M datasets. Our approach (RMSE, 100K: 0.973; 1M: 0.908) did not
outperform the single-best algorithm, SVD++ (RMSE, 100K: 0.942; 1M: 0.887). We
also develop a distinction between meta-learners that operate per-instance
(micro-level), per-data subset (mid-level), and per-dataset (global level). Our
evaluation shows that a hypothetically perfect micro-level meta-learner would
improve RMSE by 25.5% for the MovieLens 100K and 1M datasets, compared to the
overall-best algorithms used
Exploring Choice Overload in Related-Article Recommendations in Digital Libraries
We investigate the problem of choice overload - the difficulty of making a
decision when faced with many options - when displaying related-article
recommendations in digital libraries. So far, research regarding to how many
items should be displayed has mostly been done in the fields of media
recommendations and search engines. We analyze the number of recommendations in
current digital libraries. When browsing fullscreen with a laptop or desktop
PC, all display a fixed number of recommendations. 72% display three, four, or
five recommendations, none display more than ten. We provide results from an
empirical evaluation conducted with GESIS' digital library Sowiport, with
recommendations delivered by recommendations-as-a-service provider Mr. DLib. We
use click-through rate as a measure of recommendation effectiveness based on
3.4 million delivered recommendations. Our results show lower click-through
rates for higher numbers of recommendations and twice as many clicked
recommendations when displaying ten instead of one related-articles. Our
results indicate that users might quickly feel overloaded by choice.Comment: Accepted for publication at the 5th International Workshop on
Bibliometric-enhanced Information Retrieval (BIR2017
Meta-Learned Per-Instance Algorithm Selection in Scholarly Recommender Systems
The effectiveness of recommender system algorithms varies in different
real-world scenarios. It is difficult to choose a best algorithm for a scenario
due to the quantity of algorithms available, and because of their varying
performances. Furthermore, it is not possible to choose one single algorithm
that will work optimally for all recommendation requests. We apply
meta-learning to this problem of algorithm selection for scholarly article
recommendation. We train a random forest, gradient boosting machine, and
generalized linear model, to predict a best-algorithm from a pool of content
similarity-based algorithms. We evaluate our approach on an offline dataset for
scholarly article recommendation and attempt to predict the best algorithm
per-instance. The best meta-learning model achieved an average increase in F1
of 88% when compared to the average F1 of all base-algorithms (F1; 0.0708 vs
0.0376) and was significantly able to correctly select each base-algorithm
(Paired t-test; p < 0.1). The meta-learner had a 3% higher F1 when compared to
the single-best base-algorithm (F1; 0.0739 vs 0.0717). We further perform an
online evaluation of our approach, conducting an A/B test through our
recommender-as-a-service platform Mr. DLib. We deliver 148K recommendations to
users between January and March 2019. User engagement was significantly
increased for recommendations generated using our meta-learning approach when
compared to a random selection of algorithm (Click-through rate (CTR); 0.51%
vs. 0.44%, Chi-Squared test; p < 0.1), however our approach did not produce a
higher CTR than the best algorithm alone (CTR; MoreLikeThis (Title): 0.58%)
- …